Virtual environments

System installation of python and associated python packages is not always what you need. The development of scientific software usually happens faster and is not aligned with OS releases (true for both Mac and linux). On Mac the situation is confounded by the inclusion of both 32 and 64 -bit python libraries.

The solution is to create an isolated (almost) self-contained environment that has all the packages you need for a particular project. This is called virtualenv.

Aside: package management

The program that tracks the list and the state of installed packages in the system is called pacakge manager.

  • apt, apt-get, aptitude - Debian based linuxes
  • brew - for Mac (?)
  • ?? - for Windows

These tools install, upgrade and uninstall software system-wide, resolving dependencies on the fly.

However, python programs and packages are also distributed via pip. pip is cross-platform, while the above tools are platform-specific. Mixing the two is not good.

The current best practice is to install system-level stuff using your OS package manager and use pip only to install stuff inside virtualenvs.

venv (https://docs.python.org/3/library/venv.html) (formely virtualenv)

This is the original, "pythonic" way to do this. As of python 3.4 venv as a part of standard library

create a new environment

As of python 3.6 pyvenv script is deprecated.

Use python3 -m venv /path/to/environment

$ pyvenv ~/.venv/biodata3

activate environment

$ source ~/.venv/biodata3/bin/activate
(biodata3)$ deactivate

install/update packages

(biodata3)$ pip install -U pip
(biodata3)$ pip install jupyter

generate requirements file

(biodata3)$ pip freeze > requirements.txt
(biodata3)$ less requirements.txt

what's inside?

(biodata3)$ ls -lah ~/.venv/biodata3

how to install many packages/recreate environment

(biodata3)$ pip install -r requirements.txt

sometimes the order of installation is important and pip can't figure it out on its own:

(biodata3)$ cat requirements.txt | xargs -n 1 -L 1 pip install

install from github repo

(biodata3)$ pip install git+https://github.com/eco32i/ggplot.git@rewrite

or clone the repo and install locally like so:

(biodata3)$ git clone https://github.com/eco32i/ggplot.git
(biodata3)$ cd ggplot && git checkout rewrite
(biodata3) ggplot$ pip install -e .

conda

This tool is developed and maintained by Continuum Analytics, the company behind Anaconda python distribution. First install miniconda to keep things nice and lean: https://conda.io/miniconda.html

create a new environment

$ conda create --name biodata3 python=3 pandas
$ conda create --name biodata python=2 numpy matplotlib

list environments

$ conda info --envs

activate environment

$ source activate pydata3
(biodata3)$ deactivate

once inside environment you can use both conda and pip

(biodata3)$ conda install numpy
(biodata3)$ pip install ipython

generate requirements file

(biodata3)$ conda env export > environment.yml
(biodata3)$ less environment.yml

how to install many packages/recreate environment

(biodata3)$ conda env create -f environment.yml

deactivate environment

(biodata3)$ deactivate
$

Whatever you install while in the environment will only be available inside that environment. Note how the output of which python command changes when inside an environment.


In [ ]: